Deep Learning for Automated Detection and Localization of Traumatic Abdominal Solid Organ Injuries on CT Scans

Computed tomography (CT) is the most commonly used diagnostic modality for blunt abdominal trauma (BAT), significantly influencing management approaches. Deep learning models (DLMs) have shown great promise in enhancing various aspects of clinical practice. There is limited literature available on the use of DLMs specifically for trauma image evaluation. In this study, we developed a DLM aimed at detecting solid organ injuries to assist medical professionals in rapidly identifying life-threatening injuries. The study enrolled patients from a single trauma center who received abdominal CT scans between 2008 and 2017. Patients with spleen, liver, or kidney injury were categorized as the solid organ injury group, while others were considered negative cases. Only images acquired from the trauma center were enrolled. A subset of images acquired in the last year was designated as the test set, and the remaining images were utilized to train and validate the detection models. The performance of each model was assessed using metrics such as the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value, and negative predictive value based on the best Youden index operating point. The study developed the models using 1302 (87%) scans for training and tested them on 194 (13%) scans. The spleen injury model demonstrated an accuracy of 0.938 and a specificity of 0.952. The accuracy and specificity of the liver injury model were reported as 0.820 and 0.847, respectively. The kidney injury model showed an accuracy of 0.959 and a specificity of 0.989. We developed a DLM that can automate the detection of solid organ injuries by abdominal CT scans with acceptable diagnostic accuracy. It cannot replace the role of clinicians, but we can expect it to be a potential tool to accelerate the process of therapeutic decisions for trauma care.


Background
Blunt abdominal trauma (BAT), resulting from incidents such as traffic crashes, falls, assaults, or occupational accidents, is a common occurrence in the trauma bay [1,2].Studies have reported a high prevalence of intra-abdominal injury following BAT, with rates ranging from 12 to 15% [3].Among these injuries, the spleen, liver, and kidneys are the most frequently affected organs, constituting approximately 80% of all visceral injuries [4].Since the 1980s, there has been a significant shift from surgical to nonoperative management (NOM) for BAT, with numerous studies demonstrating satisfactory outcomes [5][6][7][8].The advancement of current diagnostic modalities, particularly computed tomography (CT), has played a crucial role in the NOM becoming a viable option for managing BAT patients [9].
CT scans provide accurate assessments of the severity of organ injury [10], hemoperitoneum, the presence of contrast extravasation [11], and viscus injury [12] and are crucial in predicting the need for prompt intervention [13], thereby making them the preferred diagnostic tool for hemodynamically stable patients.The extensive use of CT and a growing body of literature demonstrating promising results have led to the widespread acceptance of nonoperative management as the standard therapeutic strategy [14][15][16][17].While CT and advanced technologies yield informative results, it is crucial for clinicians to possess the necessary skills to differentiate and detect abnormalities in high-resolution images.Although the CT image can present trauma or injuries, frontline clinicians might misdiagnose due to lack of experience, a crowded working environment, or overloading duty [18,19].Achieving higher diagnostic accuracy not only relies on the capabilities of the imaging modality but also on the clinician's expertise.
The use of deep learning (DL) algorithms has proven capable of achieving diagnostic accuracy in medical imaging comparable to that of experts [20], whether applied to plain radiographs [21] or advanced medical images such as CT or magnetic resonance imaging (MRI) scans [22,23].As we enter an era characterized by collaboration between human expertise and computational power, DL algorithms hold the potential to revolutionize future medical practices, particularly by alleviating the workload of healthcare providers in emergency settings [24].Despite these advancements, the availability of trauma-related algorithms to assist trauma surgeons in managing time-sensitive and life-threatening injuries is still limited [25][26][27][28].Moreover, there is a clinical need for an explainable and transparent AI model to support emergency radiologists and clinicians, a need that remains unaddressed [29].The ongoing development and implementation of specialized DL models in trauma care hold the potential to enhance patient outcomes further and support healthcare professionals in delivering swift and effective treatment.Previous studies of DL in torsal trauma imaging focused mainly on automatically grading specific organ injuries [30][31][32], injured area segmentation [22,25], detecting active bleeding [33], and quantifying the hemorrhage amount in the chest and abdominopelvic CT scans [34][35][36][37].The process of slice-level labeling in trauma imaging, particularly for deformed injured organs, demands extensive effort from specialists.Utilizing three-dimensional (3D) DL architectures presents an opportunity to employ scan-level labeling, significantly reducing the burden of labeling efforts [38].Additionally, the advent of novel open-source frameworks for 3D organ segmentation offers promising avenues for expanded applications in CT imaging [39].In the current study, we have developed a DL-based algorithm that combines an open-source segmentation model with a 3D classification network.This algorithm is designed to detect and diagnose visceral traumatic injuries, thereby assisting clinicians in handling these lethal injuries.To the best of our knowledge, at the time of submission, this is the first published approach to detect injuries across multiple organs.

Materials and Methods
We selected patients from our trauma registry who underwent contrast-enhanced abdominal CT scans between May 2008 and December 2017 at Chang Gung Memorial Hospital, Linkou.The clinical information captured for each patient included age, gender, trauma mechanism, Abbreviated Injury Scale (AIS) scores for each body part, Injury Severity Score (ISS), interventions performed, final diagnosis, and outcome.We only enrolled the images acquired in our hospital.All of the images were acquired by the TOSHIBA Aquilion One 320 scanner in the emergency CT room.The protocols enrolled included routine abdominal scans, multiphase abdominal scans, and whole-body CT (only venous phase) with 5-mm slice thickness in axial view.In cases where a patient had multiple scans, only the earliest scan was included for analysis.We specifically utilized the venous phase of the scans for our study, excluding images of poor quality, those with artifacts, post-operative scans, or scans lacking an appropriate venous phase.CT scans obtained from external hospitals were not included in the study.
To ensure the accuracy of the image findings, a trauma surgeon who is an expert in medical image analysis with 13 years of clinical practice experience carefully reviewed images, original radiologist reports, trauma registry, and medical records to determine whether the patients had spleen, liver, or kidney injuries as scan level annotations.If the image or report is questionable, a senior radiologist with a trauma subspecialty was consulted to determine the true label.The grading of organ injuries was performed according to the AAST 2018 version [40], ensuring standardized and consistent assessment across all cases.
To increase the variability of abdominal CT scan images, we gathered an additional dataset from patients presenting with acute abdominal diseases such as appendicitis, biliary diseases, hollow organ perforation, intestinal obstruction, ischemic bowel, and other similar conditions, all of whom underwent abdominal CT scans in the emergency room.CT scans that depicted injuries to the spleen, liver, or kidneys were categorized as the positive group (indicating solid organ injury), while other findings were classified as the negative group (indicating nonsolid organ injury).Given the larger number of images in the negative group, we employed a random sampling method to balance the class distribution by selectively reducing the number of negative scans (Fig. 1).
To address potential selection bias, images acquired in the last year were set aside as an independent test set, while the remaining images were utilized as the developmental dataset.This approach ensures a more robust evaluation of the developed model's performance on unseen data.This study was approved by the Institutional Review Board (IRB) of the Chang Gung Medical Foundation with No. 202002333B0.

Image Preprocessing
The initial step involved obtaining the original CT scans in Digital Imaging and Communications in Medicine (DICOM) format from the Picture Archiving and Communication System (PACS).Specifically, the venous phase scans for each patient were identified and subsequently converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format to facilitate subsequent 3D processing.Prior to further processing, a window level ranging from − 50 to 250 HU was selected.During the training process, we augmented the image dataset by applying techniques such as translation, rotation, scaling, and elastic distortions, thereby increasing the diversity and variability of the training samples.
We designed a two-step DL algorithm to detect specific solid organ injuries, as demonstrated in Fig. 2. First, to reduce the labeling effort, we apply an open-access organ segmentation model, Totalsegmentator v.1.3[39], to generate the solid organ segmentation masks, including the spleen, liver, and kidney.The generated mask was then transformed into a 3D cuboid box to include the surrounding background of the target organ.The 3D cuboid box of each organ in the development dataset was used to train the injury classification network.All images were resized to 64 × 64 × 64 before fitting into the classification model.

Solid Organ Injury Classification Network
The solid organ injury classification model was trained using entire abdominal CT scans as a baseline to compare with individual two-step organ injury classification models.The individual organ model was trained using the cropped cuboid image generated by the organ detection model.The spleen, liver, and kidney injury classification model was trained separately.The cropped organ was fed into a 3D Convolutional Block Attention Module (CBAM) neural network [41] with a binary classification label.The input has dimensions of 64 × 64 × 64.Initially, it is processed through two 3D convolutional layers, generating outputs with four channels.Following this, the input goes through three distinct blocks, each comprising a varying number of residual blocks with a CBAM integrated as the final layer in each block.The process is completed with a Global Average Pooling and a Fully Connected Network layer.The final output dimensions are 8 × 8 × 8. Ultimately, the entire network is trained using the angular softmax loss, which facilitates the learning of features that are discriminative in terms of their angular properties for classification.For example, when training the spleen injury classification model, we only identify whether the spleen is injured despite other intra-abdominal organ injuries to eliminate the interference of other organs.We use the grad-CAM algorithm [42] to visualize whether the model focused on the target lesion to determine the reliability of the result.The results of the three organ model are also combined to calculate an overall solid organ injury detection rate.

Software and Statistical Analysis
The experimental setup utilized a workstation equipped with an Intel(R) Core(TM) i9-10900X CPU operating at 3.70 GHz, accompanied by 96 GB RAM, and NVIDIA TITAN RTX and GeForce RTX 3090 GPUs.The workstation ran on an Ubuntu 18.04 operating system.The entire pipeline was implemented using Python v3.6.9 and PyTorch v1.6.0.Preprocessing of the images involved employing various Python libraries such as diocom2nifti, NiBabel, SciPy, and OpenCV.The image annotation process was conducted using the Medical Imaging Interaction Toolkit (MITK).At the same time, data augmentation was performed utilizing the tools provided by the Medical Open Network for Artificial Intelligence (MONAI).
For the statistical analysis, we utilized R version 4.2.2 with the "pROC" package.Model classification performance was evaluated through a confusion matrix, allowing us to assess accuracy, sensitivity, specificity, false positive rate (FPR), false negative rate (FNR), positive predictive value (PPV), and negative predictive value (NPV).Furthermore, we employed the receiver operating characteristic (ROC) curve and calculated the area under the ROC curve (AUROC) to assess model performance (Fig. 3).The optimal cutoff value was determined using the Youden index, and all model performances were compared based on the cutoff value with the best Youden index.To estimate the confidence interval of the ROC curve, we utilized the bootstrapping method.The comparisons of performance metrics between each organ-specific model and the whole image model were conducted using McNemar's test or the binomial proportions test, as appropriate.The continuous variables of the demographic data were compared using the Kruskal-Wallis rank-sum test, while categorical variables were compared using the chi-square test.

Results
We gathered a total of 1496 venous phase abdominal CT scans from an equal number of patients.From this dataset, we preserved 194 scans acquired in the last year as an independent test set, while the remaining 1302 scans were allocated to form the development dataset.Table 1 displays the demographic characteristics of this dataset.To ensure improved algorithm training, we balanced the classes within the development set.In the test set, the proportion of positive cases is relatively small, reflecting the clinical distribution.Among the 72 positive cases identified, 16 patients (22.2%) were found to have more than one solid organ injury.
The baseline whole image model exhibited a reasonably good AUROC of 0.842; however, it lacked the capability to identify the specific injured organ accurately.Even the visualization heatmap was unsuccessful in pinpointing the site of injury.Among the 72 patients with solid organ injuries, 19 (26.4%) cases were missed by this model.On the other hand, the spleen injury classification model displayed a high accuracy of 0.938 and successfully identified 25 (86.2% sensitivity) of the spleen-injured patients with eight false positives (5% FPR).The four patients the model missed were all cases of low-grade splenic injury.Similarly, the liver injury model showed a slight improvement over the whole image model, achieving an AUROC of 0.869.Nonetheless, it still failed to identify 12 patients with liver injuries (16.7% FNR), with 23 false positives(15% FPR).For the kidney model, evaluation on both sides of the kidneys demonstrated a high specificity Fig. 2 A comprehensive overview of the algorithm design for solid organ injury detection of 96.6% (6 false positives, 3.4% FPR), but the sensitivity was relatively low at 83.3% (6 false negatives, 33.3 FNR).Combining the three organ models, the overall accuracy, sensitivity, and specificity to detect solid organ injuries in the CT scan reached 84.0%, 87.5%, and 82.0%, respectively (Table 2).
The p value of each model was statistically analyzed compared with the whole image model.

Discussion
Up until now, we have developed a DL algorithm that automates the detection of solid organ injuries in abdominal CT scans.The diagnostic accuracy achieved 0.938 (0.902-0.969), 0.820 (0.763-0.871), and 0.959 (0.933-0.979) for splenic, hepatic, and renal injuries, respectively.These accuracy levels are accompanied by satisfactory sensitivity and specificity.Notably, for splenic and renal injuries, the model showed good diagnostic accuracy, providing precise locations of injuries through the use of heatmaps.The accuracy for hepatic injuries is comparatively lower, possibly owing to the cuboid cropping method, which includes neighboring organs and introduces noise during model training, thereby diminishing performance.However, the diagnostic accuracy remains acceptable.This study is the first to introduce a multitask DL model specifically designed for trauma detection in abdominal CT scans.To enhance the model's explainability, we have incorporated Grad-CAM producing heatmaps into the examined images.This technique is commonly applied in medical image analysis to visualize the dominant parts of the input image for the prediction [43].As in Fig. 4, the heatmaps focused on the injured organ part in our test set; however, this feature can only demonstrate the possible area the model focused on to make the decision rather than precisely contouring the injured area.Segmenting the injured part requires a huge labeling effort, especially for the ambitious boundary of the current task, which is challenging future work.The average processing time to generate results in our hardware setting was only 3 min, and the process can be automatically initiated after the completion of image acquisition.
In the context of trauma treatment, where time is critical, reducing diagnostic time can lead to early detection and intervention, potentially minimizing the risk of significant blood loss and improving patient outcomes.The shift from operative to nonoperative management in patients with BAT has been primarily driven by advancements in diagnostic tools and a reduction in complications associated with operative procedures [8,44].The critical and essential step in this process is the careful selection of suitable patients.Abdominal CT scans play a vital role in assisting traumatologists with patient selection [45][46][47].By utilizing DLM support, we can achieve a high negative predictive rate, with values of 0.975 (0.952-0.994), 0.914 (0.876-0.949), and 0.967 (0.945-0.989) for splenic, hepatic, and renal injuries, respectively.A secondary check by clinical experts can further reduce the misdiagnosis rate.The ultimate goal for both clinical data scientists and clinicians is to develop a multitask DLM that can significantly accelerate the patient evaluation process [48].By leveraging the concepts demonstrated in this study, the future of solid organ injury detection modeling holds tremendous potential.The combination of advanced diagnostic tools and DLM support is paving the way for more accurate and efficient patient assessments, ultimately leading to improved trauma care outcomes.
Detecting traumatic solid organ injuries poses a significant challenge in algorithm development, primarily due to the complex nature of image morphology and the occurrence of multiple organ injuries simultaneously.Often, injured organs share similar radiological findings, such as hemoperitoneum, hematoma, or contrast extravasation, making it difficult to distinguish between them.Moreover, accurately defining the precise location of the injured part within a specific organ proves to be a daunting task, particularly when compared with other body parts like brain hemorrhage identification [49].In our current study, we have devised a two-step algorithm design to address these challenges effectively.By focusing on specific organs in the first step, we aim to reduce the complexity and improve the algorithm's accuracy in detecting organ-specific injuries.Additionally, to enhance the interpretability of the model, we employ a heatmap localization technique in the second step, allowing us to highlight the injured region within the specific organ.This localization approach greatly enhances the model's explainability, facilitating better understanding and trust in the detection results.Through these advancements, our algorithm demonstrates promising results in detecting traumatic solid organ injuries, offering potential benefits for clinical applications.A follow-up study in the future focusing on Previous research has concentrated on segmentation and the automatic grading of the severity of injured organs.Drezin et al. utilized a DL-based approach for segmentation, enhanced by decision tree analysis, to predict significant arterial damage in liver trauma, achieving an accuracy of 0.84 [25].Similarly, Farzaneh et al. proposed a framework for liver trauma detection and quantitative assessment applied to 77 CT scans [50].Chen et al. developed a fourcomponent algorithm for the automatic grading of spleen injuries [51], and Farzaneh et al. also described automated kidney segmentation for trauma patients using active contour modeling [52].Zhou et al. use an external attention and synthetic phase augmentation module on a small dataset to improve the multiphase splenic vascular injury segmentation with DeepLab-v3 baseline [33].
Additionally, Tulum et al. developed a computer-aided segmentation system for traumatic kidney analysis [53].In the classification realm for detecting organ injury, Wang et al. employed machine learning techniques with 3D active contours to identify spleen injuries, using a dataset of 54 healthy and 45 lacerated spleens.This method was validated with fivefold cross-validation, achieving an AUC of 0.91 in the test set [32].Hamghalam et al. developed a DL model using 608 scans of each of the injured and noninjured spleens, achieving an accuracy of 0.808 with fivefold cross-validation [54].Compared with the previous studies, our approach offers an alternative approach to multitask DL algorithm design.We can obtain information about multiple solid organ injuries by inputting CT images into our model instead of focusing on a single organ.This design is particularly well suited for the high-tension and crowded environment of trauma bays.Developing a globalized or generalized algorithm to address all issues within the same images can be challenging, especially with limited resources.However, with careful consideration of clinical domain knowledge and well-defined data labeling, it is possible to develop clinically beneficial algorithms tailored to specific problems in medical image analysis.Large high-technology companies have also shown interest in these sectors [55,56], and their involvement is expected to drive significant improvements in this technology.A notable advantage of the current algorithm trends is the problem-oriented approach, where algorithms are tailored to address specific medical challenges.
Moreover, using automatically cropped images and advanced analysis has the added benefit of reducing calculation time and lightening the workload on workstations.Although the application of the Totalsegmentator will make the inference time longer, the cropping step can reduce the noise from the surrounding organ for classification tasks to improve performance.This, in turn, translates to cost savings, as the investment and equipment requirements for deploying the AI algorithm can be kept reasonable.While DL algorithms have proven their potential in assisting various healthcare tasks, little attention has been given to discussing the costs and minimal computer system requirements.Innovative DL network architectures, such as Transformer-based structures, have the potential to deliver superior performance and eliminate the need for the cropping step.However, this approach requires significantly more computational power, particularly for processing 3D images.Setting up a system capable of deploying DL algorithms often necessitates additional servers or computers equipped with graph processing units (GPUs), which can increase expenses for institutes.To address this, efforts have been made to optimize GPU usage and minimize the necessity of high-end computers to make DL algorithms more accessible and cost-effective for implementation.

Limitations
The current study has several limitations.Firstly, the algorithm was trained using images from a single trauma center, raising concerns about its performance when applied to images from other institutions with varying CT protocols and image acquisition methods.This discrepancy could affect the algorithm's generalizability and raise questions about its reliability in different clinical settings.Secondly, the ground truth is based on the radiologist's report and a single annotator's confirmation.Moreover, the imbalanced distribution of solid organ injuries, such as kidney injury being relatively rare compared to spleen and liver injuries, may introduce bias during the performance evaluation.This could potentially lead to an overestimation of the algorithm's accuracy for more common injuries and an underestimation for less frequent ones.The segmentation mask generated by the Totalsegmentator is also a concern for those largely deformed organs.This can lead to failure on the cropping step since the Totalsegmentator is trained on nontrauma images.Prospective multicenter data collection, incorporating clinical trauma scenario class distribution, is imperative to ensure the robustness of the algorithm and to address these limitations.Incorporating multiple independent trauma specialist radiologists for image annotation in future studies promises to not only enhance lesion localization and characterization accuracy but also to address the issue of inconsistent AAST grading agreement [57].Conducting an evaluation on a diverse dataset from multiple centers with a graphic user interface will help verify the algorithm's performance in real-world scenarios, accounting for various imaging protocols and organ injury distributions.It will strengthen the credibility and applicability of the algorithm as a clinical tool.

Conclusion
The developed DLM serves as a valuable tool to assist medical professionals in identifying traumatic solid organ injuries with promising diagnostic accuracy.It is essential to note that the algorithm does not aim to replace the expertise and judgment of clinicians; instead, it complements their skills and knowledge.By leveraging the DLM, medical professionals can use this tool to accelerate the diagnostic process and improve the overall efficiency of trauma care.

Fig. 3
Fig. 3 The ROC curve and AUROC of each solid organ injury detection model.A Whole image model.B Spleen injury model.C Liver injury model.D Kidney injury model.The shaded area represented the 95% confidence interval

Fig. 4
Fig. 4 Visualization examples of each solid organ injury detection model.A The heatmap from the whole image model indicates a failure to localize the spleen injury.B The heatmap from the spleen injury model accurately highlights the lacerated area and hematoma

Table 2
Performance of solid organ injury detection models on the test set